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Detection of cosmic filaments using the Candy model 
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Abstract. We propose to apply a marked point process to automatically delineate filaments of the large-scale structure in 
redshift catalogues. We illustrate the feasibility of the idea on an example of simulated catalogues, describe the procedure, and 
characterize the results. We find the distribution of the length of the filaments, and suggest how to use this approach to obtain 
other statistical characteristics of filamentary networks. 
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1. Introduction 

The large-scale structure of the Universe is studied by creating 
galaxy maps - positions of thousands (a few years ago) and 
millions (nowadays) of galaxies in space. The angular positions 
of galaxies are relatively easy to measure, but their distances 
can be estimated only by measuring their recession velocities. 
The latter task is difficult, especially for faint distant objects, 
and thus really detailed maps of galaxies have started to appear 
only lately. An additional caveat is that the recession velocities 
contain a contribution from the dynamical velocity of a galaxy, 
so the estimated distances define the so-called 'redshift space' . 

The dominant feature of these maps, as of all other galaxy 
maps of the large-scale structure of the universe, is the network 
of filaments of different sizes and contrast, along with relatively 
empty voids between the filaments. The filamentary network 
contains different scales, where smaller-scale filaments are also 
less prominent. The gradual disappearance of structures with 
increasing distance is due to the fact that the apparent lumi- 
nosity of a galaxy is the fainter the more distant it is, and in 
more distant regions we can observe only a few of the brightest 
galaxies. 

Clusters of galaxies lie at the intersections of filaments. 
Several authors have unambiguously dete cted filaments as 
inter-cluster structures. Using weak lensing. lGrav et al. (2002) 
found a fila ment co n nectin g Abell clusters A901 and A902, 
while iDietrich et alJ J2004). with a similar technique, found 
a filament be t ween A222 and A223. In a recent paper 
Pimb blet et all J2004 studied the frequency and the distribu- 
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tion of filaments in the 2dF galaxy redshift survey. They were 
motivated by the analysis of the filamentary pattern s of the A- 
CDM simulations reported by Colber g et alJ J2004h . Other ob- 
servational efforts have concluded with th e detection of inter- 
cluster filaments at vario us redshifts (Pimbbl et & DrinkwateJ 
l2004UEbeling et alJ2004l) . 

Although the filaments are prominent, there is no good 
method to describe such a filamentary structure. The usual sec- 
ond moment methods in real space or in the Fourier space (the 
two-point correlation function and power spectra) do not de- 
scribe well filamentary structures. The method that has been 
used most is the minim al spanning tree (MST, see a review in 
iMartfnez & SaaJE()0 2). The first application of the MST for- 
malism to describ e the filamen tary networks of galaxy maps 
was that of B arrow et alJ dl985h : many later studies have used 
it. 

The minimal spanning tree is unique for a given point set, 
which is good, and it connects all the points, which is not good. 
When the number of galaxies is large, the MST is rather fuzzy, 
and it describes mainly the local nearest-neighbour distribution 
(we shall show an example of a minimal spanning tree in Sect. 
5 and in Fig. (S). The filamentary network seen by eye com- 
bines both local and large-scale features of the point distribu- 
tion. Moreover, it is well known that in the Sloan Digital Sky 
Survey (SDSS) and in the Two-Degree Field Galaxy Redshift 
Survey (2dFGRS), there are many missing galaxies that have 
not been targete d by the surveys because of a variet y of se- 
lection effects (Pimbbl et et all200ltlc"ross et all2004T) . For in- 
complete samples the MST is not a good choice for analysing 
filaments. 
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Thus, a better notion would be that of the skele- 
ton^ proposed recent ly to describe continuous density fields 
JNovikov et alJl2003h . The skeleton is formed by lines paral- 
lel to the gradient of the field, which connect the saddle points 
to local maxima of the field. Calculating the skeleton, however, 
involves smoothing the point distribution, which will introduce 
an extra parameter, therefore this method is not well suited for 
point distributions. 

We propose to use an automated method to trace filaments 
for realizations of point processes that has been shown to work 
well for the detection of road networks in remote sensing 
JLacoste et al.ll2002l IStoica et al.ll2002l |2004 . This method is 
based on the Candy model, a marked point process, where seg- 
ments serve as marks. As this is the first time such a method is 
used for the galaxy distribution, we describe it in detail below. 
We also test it on 2-D simulated galaxy maps, justifying our 
model choice. The task differs considerably from road network 
detection, as the noise is larger, and we have no continuous 
roads, but sparsely populated ridges instead. 

The present approach allows us to find the length distribu- 
tion for the filaments; we give examples of this distribution for 
different data samples. In this paper, we choose the Candy pro- 
cess parameters by trial and error following a reversible jump 
process. As the method is automated, it can also be used to esti- 
mate those parameters by using maximum likelihood methods; 
these will serve then as new statistics for filament networks. 

2. Marked point processes 

Let (K, S, v) be a measure space, where A" is a compact subset 
of R 2 of strictly positive Lebesgue measure < v(K) < oo and 
S the associated Borel cr-algebra of subsets of K. For n e N let 
K„ be the set of all unordered configurations k = {k\, h%, . . . , k„} 
that consist of n not necessarily distinct points k, e K. Let us 
consider the configuration space Q. = \J°°_ K n equipped with 
the cr-algebra T generated by the mappings {k\, &2, . . . , k„] — > 
YIi=\ e B} counting the number of points in Borel sets B e 
S. A point process on K is a measurable map from a probability 
space into (Q, T). For introductory materia l on poin t processes 
we refer the reader to the textbooks bv lvan Lieshoutlll200(jl) and 
lReissHl993l) . 

The reference measure is given by the unit rate Poisson pro- 
cess that distributes the points in K according to a Poisson pro- 
cess with intensity v. 

Different characteristics or marks may be attached to the 
points. Under these circumstances, we consider a point process 
on K x M as the random sequence x = {{k\ , m\), . . . , (k„, m„)} 
where n eNo, k/ e K and m, e M for all / = 1, . . . ,n. The char- 
acteristics space M is equipped with its corresponding Borel 
cr-algebra and the probability measure v«. A marked point 
process X with locations in K and marks in M is a point process 
onfxM such that the distribution of locations only is a point 
process on K. 

In this case, the reference measure is the unit rate Poisson 
process on KxM, with the locations distributed according to a 
Poisson process with intensity v and i.i.d 1 marks according to 

1 i.i.d. - independent and identically distributed 



vm- When the marks represent parameters of an object, such a 
process is sometimes called an object point process. 

The reference measure exhibits no interaction between 
points or objects. Indeed, we can construct a much more com- 
plicated marked point process by specifying a probability den- 
sity with respect to the reference measure: 

/?(x) = aexp[-£/(x)], (1) 

with a the normalizing constant and t/(x) the interaction en- 
ergy of the system. The energy function is written as the sum 

? 

U(x) = J] J] w°>(jta,...,Jty) (2) 
;'=1 {xn„..,x!j}e& 

where x is a realization of the marked point process, afi' : (K x 
My — > R for /' = l,...,q are the interaction potentials. The 
marked point processes with a probability density of the form 
given by Eq. ([0 are known in physics under the name of Gibbs 
point processes. If there exists a positive real C > such that 
U(x)-U(xl){(k,m)}) < logCforall (k,m) e K xM the process 
is said to be locally stable. 

T his relation implies the Ruelle stability condition dRuelld 
1969), which ensures the integrability of a given probability 
density function. Furthermore, local stability is essential in es- 
tablishing convergence p roofs for the Monte Carlo dynamics 
simulating such a model dGeverl 19991) . 

For our problem, y, the data to be analysed, consist of 
points (galaxies) spread in a finite window K. We want to ex- 
tract the filamentary structure of these data. It is natural to con- 
sider the filaments x we want to detect as a set of random seg- 
ments being the realization of a marked point process. 

The probability density of such a marked point process is 
given by 

Py (x) cc exp[-(t/ y (x) + U r (x))] (3) 

with the terms U y (x) and U r (x) being the data energy and the 
interaction energy, respectively. The first term is related to the 
location of the filaments among the galaxies, whereas the sec- 
ond is related to the geometrical properties of the filaments, 
playing the role of a regularization term. 

The configuration of segments composing the filamentary 
network is estimated by the minimum of the total energy of the 
system 

x = argmin{f/ v (x) + U r (x)}. (4) 

X 

In the following we will present the two components of the 
energy function. Considerations about the simulation of such 
models using the Monte Carlo Markov Chain dynamics will be 
given and a simulated annealing algorithm will be presented. 
Finally, we will apply the model to describe two-dimensional 
filamentary networks of galaxies. 

3. A probabilistic model for the filamentary 
structure of galaxy maps 

3.1. The interaction energy: Candy model 

The filaments we want to extract are composed of non- 
overlapping connected segments. Locally, the curvature of one 
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Fig. 1. Connection and alignment interaction between seg- 
ments. The circle around a segment represents the rejection 
region, whereas the circles around its extremities indicate the 
connection region. The values t and 6 are the curvature values 
allowing segments to align and to cross, respectively. 



filament does not vary too much. In our data we can notice 
just a few short filaments, which can be represented by isolated 
segments. 

Under these considerations a natural choice for the interac- 
tion energy becomes the Candy model, a marked point process 
simulating random networks of segments. Here, a segment is 
seen as a random object f = (k, (6, w, /)) that is characterized 
by its center location k e K and its geometrical parameters 
(6»,w,0 e [0,tt] x [w m i n ,w max ] x [Zmin.ZmaJ = M, represent- 
ing its orientation, width and length respectively. The Candy 
model exhibits three types of interactions between segments: 
connectivity, alignment and rejection. 

Historically speaking, the model was introduced for the 
first time as a prior distribution for th i n network extrac- 
tion i n rem otely sensed images dStoical 1200 it IStoica et"afl 
120021 2004). Properties of the model such as local sta- 
bility and Markovianity, convergence proofs of an adapted 
Metropolis-Hastings dynamics for simulating the model as 
well as p arameter estimation were further investigated in 
Ivan Lieshout & Stoical J2003al) . Different versions of the model 
were analysed and compared for th e special case of road net- 
work detection (Lac oste et al.i2 002). 

A segment has a connection region formed by the union of 
the two circles centered at its extremities and of radius r c . Two 
segments 77 = (k n , (Or,, w n , l n )) and £ = (kf , (% w%, 1$)) are con- 
nected 77 ~ c ( if only one extremity of a segment is in the con- 
nection region of the other segment and if \\6 n - Qg\\ < r. With 
respect to this definition, a segment is doubly connected if both 
of its extremities are connected, singly connected if only one 
of its extremities is connected and free if none of its extremi- 
ties is connected. The Candy model favours doubly connected 
segments whereas free and singly connected segments are pe- 
nalized. 

In Fig.^we show an example of a configuration of seg- 
ments. The free segments are x%, X4, x^ and x-j, this because the 
segment X2 does not fullfil the orientation requirements for the 
connection and the others do not respect the connection condi- 



tion. The segments Xj and x$ are singly connected, whereas the 
segment X\ is doubly connected. 

Similarly, the attraction region of a segment 77 is the union 
of both circles centered at each extremity with a radius r a = 
It,/ 4. Two segments 77 and f exhibit alignment interaction 77 ~ 
£ if d(k,,, k£) > \ max{Z ); , /^}, if only one extremity of a segment 
is in the attraction region of the other segment, and if min{||#, ; - 
0(\\,n - \\6 n - ( \\] > t, with t a threshold value. The Candy 
model penalizes the segments having alignment interaction. 

In the configuration shown in Fig.^^ *o X\ and x$ + a X\, 
while the segments X2 ~ x\ and x\ ~ a X\ because these pairs 
of segments exhibit high differences between their orientations. 

Connectivity is a stronger interaction than alignment. Still, 
as we look for the filaments fitting the data in a random way, 
this last interaction gives us the possibility not to eliminate 
from the current configuration the segments with low data en- 
ergy, which are almost connected. 

Every segment 77 is provided with a rejection region given 
by a circle centered in k,) and of a radius 77 = l n /2. Two seg- 
ments 77 and f exhibit rejection interaction if d(k n , k£) < 
and if \\\\Q n - 6^|| - n/2\\ > 6, where S is a threshold value. The 
Candy model forbids configurations containing rejecting (over- 
lapping) segments. 

If d(kr,,k c ) < ±max{/",,,/ f } and if ||||^ - 6 C \\ - n/2\\ < 6, 
then the segments may cross or form a "T" junction. The con- 
figurations with crossing segments 77 ~ x ( are forbidden by the 
Candy model, whereas the "T" junctions are allowed. 

Clearly, in Fig.[^the segments X\ and X(, do not reject each 
other since they are far enough apart, while the segments x\ 
and x-j do not cross, forming a "T" junction. 

For any configuration of segments x = \x\,...,x„} with 
i = 1 , . . . , n, we are able now to write for the probability density 
of the Candy model 



Ui<jM x i *r Xj}l{Xi + x Xj] 



(5) 



where jj, jf, y s > and y a e (0, 1) are the model parameters, 
»d(x), ny(x), n. s (x) are the numbers of doubly, free and singly 
connected segments, and « G (x) is the number of pairs of seg- 
ments which are not well aligned. In order to avoid too many 
small segments in the configuration, the model favours seg- 
ments covering a large area. Clearly the interaction energy is 
obtained taking t/ r (x) = - log p r (x). 

With respect to the cla s sical de finition of the Candy model 
in Ivan Lieshout & Stoical d2003al) . the model described by 
Eq. (|5j contains differences in the definition of interactions be- 
tween segments. We kept the same name for our model, as we 
believe that the modifications required to apply it to cosmo- 
logical data do not change the basic premises of the classical 
Candy model. Concerning connectivity, the present modifica- 
tions were introduced in order to eliminate some "undesired" 
configurations, such as a segment being connected with itself 
or a segment being connected at one extremity with both ex- 
tremities of another segment. Furthermore, the new modifica- 
tions allow us to build more appropriate tailored moves for the 
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x ir 



W i+k 

Fig. 2. Locating segments in a pattern of points. 

Metropolis-Hastings dynamics simulating the model. The re- 
jection region was extended, as the filaments we observe may 
be rather wide, hence we want to avoid overlapping of seg- 
ments when the data are good enough. This is also the reason 
why the width penalty was introduced. Nevertheless, it is easy 
to prove that under these modifications, together with the one 
concerning the crossing interaction the Candy model is still 
locall y stable and (Ripley-Kelly) Markovian ( Ripl ey & Kellvl 
Il977l) . 

3.2. The data energy 

The data ener gy checks whether a segment belongs to the net- 
work or not l lStoicall200ll IStoica et all [20021 12004 . A seg- 
ment x is considered a part of the filament network, if its ge- 
ometrical shape x covers as many galaxies as possible. Still, 
we want to avoid the case where segments are found in a 
cloud of points rather than in a filament. Clouds of points can 
also be considered as inter-cluster filaments, but we want to 
favour the selection of the shoelace-like filaments, which are 
the more common mor phological type ( Colb erg et alJ 12004: 
Pim bblet & Drinkwaterll2004T) . although other shapes (ribbons 
and sheets) can still be detected by the Candy model. To do 
this, we consider the shadow segments x r and and xi - the seg- 
ments situated to the right and to the left of the segment x, as in 
Fig. 12 The above-mentioned case is avoided if the number of 
galaxies covered by x r and x\ is small. Therefore, let us define 
the quantity v y (x) given by 



v y (x) = 2n(y ni)- n(y n x r ) - n(y n xi), 



(6) 



where n(y ni) is the number of galaxies covered by the geomet- 
rical shape of the segment x. Now, if the following three condi- 
tions: v y (x) > 3, n(y ni) > «(y n x r ), and «(y fli)> n(y n xi) 



Table 1. Parameters of the data sets: a is the cosmological ex- 
pansion factor, n is the number of galaxies, a is the void density 
threshold and /3 is the biasing amplitude. 



Case 


a 


n 


a 


P 


A 


1.0 


4127 


0.5 


0.20 


B 


0.6 


4249 


0.5 


0.18 


C 


0.2 


8879 


1.0 


0.49 



are simultaneously fulfilled, the data energy contribution of a 
segment is V y ({.x}) = -v y (x). If one of the three conditions is 
not fulfilled then V y ({x}) = V m ax, with V max > a positive fixed 
value. 

The total data energy is defined as the sum of the data en- 
ergy contributions of every segment in the configuration 



C7 y (x) = J] Vy({x}) 



(7) 



Greenl Il995l iKendall & Mdllerl l2000t Ivan 
van Lieshout & Stoical2003bHPrestonl 19771) . ' 



3.3. Simulation dynamics and optimization 

The equations Q and @ provide us with all the ingredients 
needed to construct the Gibbs point process given by Eq. 0. 
The estimate of the network (Eq. is obtained by means of a 
simulated annealing algorithm. 

This algorithm iteratively samples the law [p y (x)] t while 
slowly decreasing the temperature T. At high positive temper- 
ature values the state space is explored. When the temperature 
goes down to zero, T — > 0, the configurations of minimal en- 
ergy are reached. A polynomial decreasing scheme Tk+i = cTk 
with c e [0.99, 1.00] may be used for cooling. 

For sampling from a probability density of a point pro- 
cess several Monte Carlo methods are available, such as the 
spatial birth-and-death process, the Metropolis-Hastings and 
reversible jumps dynamics, or the much more recent ex- 
act simulation techniq ues such as coupling f rom the past 
or clan of ancestors JGever & Mailed [l994t iGeverl M999; 

Lieshoutf boori 
The Candy point 
process exhibits rather complicated interactions, hence the 
use of the spatial birth-and-death process or the cited ex- 
act simulation techniques are useful in practice only for a 
limited range of the model parameters. Therefore, for our 
present model we opted for a sampling algorithm based 
on the Metropolis-Hastings dynamics. Details concerning 
the implementation of samplers for the Candy model based 
on Metropoli s-Hastings or reversib l e jump s proces s es ca n 
be found in Ivan Lieshout & Stoical d2003 d) : IStoical d200ll) : 
IStoica et alJ J2002ll2004 ~ 

4. Data 

The Candy process and its applications have been developed 
for 2-D maps. So the natural way to introduce them in cosmol- 
ogy is to consider 2-D cases, also. This will allow us to compare 
the results, and will make it easier to understand the problems 
arising. Our final goal is to apply the Candy process to describe 
3-D networks of filaments, as the large-scale structure maps fill 
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Fig. 5. The simulated galaxy distribution for the three sets of data. Panel B corresponds to the dark matter distribution shown in 
Fig-El In this and the following figures the units of distance are h~ l Mpc, though in a 2-D world. 
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Fig. 3. The spectral density used for the 2-D simulation {Pi), 
the corresponding spectral density for the 3-D case (Pj), and 
the spectral energy per unit logarithmic wavenumber interval 
A 2 , versus the wavenumber k. 



the space. The 3-D network consists of complete filaments, as 
do the 2-D geographical road maps, so the filaments in the test 
data should also be complete. 

The observational galaxy maps showing filaments mainly 
have the geometry of a thin slice. Although such data have been 
used to study the large-scale filamentary structure, the slices do 
not provide proper data for that. The thickness of these slices is 
much smaller than the typical size of a filament, and although 
the maps give a visual impression of filaments, the filaments we 
see are pieces of real filaments, obtained by planar cuts through 
the real 3-D structure. 

Another possibility is to use thicker slices, which can be 
selected, e.g., from the only lar ge-scale contiguou s data set 
for the moment, the 2dF survey (Coll ess et alJl2003l) . But this 
choice carries its own difficulties - thick slices give us the 2-D 
projection of the 3-D network, smearing essential details. 

Simulations of the formation and evolution of large-scale 
structure can also provide us with galaxy maps. As a demon- 
stration that we understand the basic features of the process, 



Fig. 4. Dark matter density (logarithmic scale) for the data set 
B. 



these maps show filamentary structure. How and why an initial 
Gaussian random density field develops filamen ts under self- 
gravitation is well explained by Bond et al. ( 1996). 

The usual simulations give us 3-D universes, but it is easy 
to also simulate the evolution of structure in a 2-D universe. 
This has been done before, to obtain better numerical resolution 
(see, e.g., Be acom et alJl99ll) : we used 2-D simulations to get 
complete cosmological networks of model galaxies. 

The present-day large-scale structure is determined, first, 
by the expansion history of the cosmological model, and, sec- 
ond, by the initial density and velocity fields at the start of the 
simulat ion. We chose the sta ndard 'concordance' cosmological 
model ( Tegm arket al.ll200ll) to describe the expansion. As the 
initial fields are assumed to be Gaussian random fields, they 
are described by their power spectra (the spectral density of the 
density perturb ations P(k), where k is the module of the wave- 
vector; see, e.g., Martinez & Saar 2002). We chose a simple ex- 
pression for the spectral density that d escribes reas onably well 
the Cold Dark Matter (CDM) model Jjenkins etalJll998t) and 
modified it to get the same spectral energy contribution to the 
variance per unit logarithmic wavenumber interval, A 2 {k), in 
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our 2-D universe as in the real 3-D universe. In a 3-D universe 
this quantity is defined as 

Aj(k) = ^P 3 (k)k\ 
and in a 2-D universe as 

the equality of the above quantities gives 

Pi(k) = -P 3 (k) 
n 

(the lower indices show the dimensionality of the space). 
This is the spectral density we used, with P^(k) taken from 
Jen kins et alJ ( I1998I) . Both the spectral densities and the spec- 
tral energy used are shown in Fig. [21 As usual, the wavenum- 
ber is given in units of /z/Mpc where h is the dimension- 
less Hubble parameter, the spectral densities are in units of 
Mpc 3 //! 3 (P 3W), Mpc 2 //i 2 (P 2 (k)), and A 2 (k) is dimensionless. 

We selected the scales and spectrum amplitudes to get a pic- 
ture similar to that we see in 3-D models (the size of the patch 
we modelled was 128/i~'Mpc, and we used a 256 2 grid with the 
same number of cold dark matter particles). These numbers are 
not really important, as this is a mock model, anyway. Then we 
ran a 2-D dynamical A^-body simulation, developing the initial 
perturbations into large-scale structures - the present-day den- 
sity and velocity fields. 

These density fields describe the dark matter content of the 
universe. Populating model universes with galaxies is a com- 
plex problem, but for our purposes simple recipes are suffi- 
cient. We used two well-known properties of the large-scale 
galaxy distribution. First, galaxies avoid large low-density re- 
gions, known as voids; we modeled this by selecting a den- 
sity threshold a (all our densities are given in the units of the 
mean density). In regions with density lower than this thresh- 
old no galaxies were placed. Secondly, galaxy density is biased 
in respect to the dark matter density. We found that the model 
galaxy distribution best resembled the observational maps for 
a nonlinear biasing law: 

Pgai = P VPcdm, Pcdm > a. (8) 

We chose the amplitudes a and /3 to produce approximately the 
same number of galaxies as observed in cosmological slices of 
similar size. 

Finally, we generated a realization of a Cox point process, 
using the galaxy density given by Eq. (|8j as the driving proba- 
bility. In order to see how well the Candy model works in dif- 
ferent situations, we chose three different time moments from 
the simulation, with a different filamentary structure. As the 
earliest of them (our case C) has a very rich set of filaments, 
we generated about twice as many galaxies for that data set as 
for other sets. As usual in cosmology, we characterize the time 
moments by the value of the expansion factor a. This factor 
equals unity at the present epoch, and the earlier the epoch, the 
smaller is the expansion factor (our universe expands). The pa- 
rameters for our three data sets are given in Tabled an d the 
dark matter density and galaxy distributions are compared for 
the set Bin Fig. |H 

All the data sets are shown in Fig- EI 



Table 2. The interaction parameters. 



Parameters 


Sets 


1 


2 


3 


-logr</ 


-10 


-10 


-10 


-log Jf 


8 


7 


7 


-logy., 


2 


2 


1 


-logy„ 


3 


3 


3 


'max 


25 


25 


25 



4.1. Experimental results 

A simulated annealing algorithm was implemented based on 
Metropolis-Hastings dynamics. The parameter for the cooling 
scheme was taken as c — 0.9995 and the initial temperature 
was set to 10. The algorithm was run for 10 7 iterations, and the 
temperature was lowered every 10 3 steps. 

The Candy model has a large number of parameters, and 
these should be chosen carefully in order to get a good rep- 
resentation of the filaments in the data. The segment parame- 
ters (segment lengths and widths) have to be chosen such that 
the model filaments follow those in the data. Thus, for the first 
two data sets, the segment parameters were Z m ; n = 3,Z max = 
5, Wmin = 1, w max = 2; for the third data set, smaller segments 
were considered: Z m j n = 2,Z max = 3,Wmin = 0.95 1.05 
(all distances are given in h Mpc). The interaction regions 
were defined by choosing the radius of the connecting region 
r c = 0.5 and the rejection parameter that forbids segments to 
cross, 8 = 0.1 radians. The orientation parameter, which limits 
the local curvature of filaments, was chosen to be t = 0.5 radi- 
ans for the first two data sets and to be t — 0.75 radians for the 
data set C. 

We experimented with a large number of interaction param- 
eters. Here we show the results for the three sets; they give al- 
most equally good results. The interaction parameters for these 
sets are given in Tabled The optimization method was run for 
each data set. High potentials were given to undesired config- 
urations such as single and free segments, badly aligned pairs 
of segments with respect to the parameter r, and badly placed 
segments with respect to the data term. 

The best filament network and the comparison between the 
three networks for each set of data is shown in Fig. [6] Note that 
we do not use the periodicity of the data — although numerical 
simulations are mostly periodic, the real galaxy distribution is 
not. Thus there is no sense in complicating the numerical pro- 
cedure. 

The best set of parameters for data set A was set 1. 
Examining the top row of Fig. [6] we see that the procedure 
works well. All obvious filaments that one would draw by eye 
are found, and the placement of "secondary" filaments in more 
sparsely populated regions is also good. Note also that galaxy 
concentrations ("clusters") do not destroy the filamentary pat- 
tern; filaments usually branch in these regions. 

The difference between the sets is slight, all parameter sets 
represent the network fairly well. All strong filaments coincide, 
the difference is in the small and weak filaments, built on a few 
points only. This is well seen in the right panel, where all three 
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Fig. 6. Results obtained for the three data sets: A (top row), B (middle row), and C (bottom row). The left panels show the "best 
network" extraction superposed on the data, the right panels show the three networks superposed. The networks for the first 
parameter set are shown by red curves, for set 2 by blue curves, and for set 3 by green curves. 



networks are superposed. Parameter set 2, for instance, gener- 
ates spurious filaments in the sparsely populated upper central 
region of the point distribution, and set 3 produces several very 
short isolated filaments. On the other hand, it also provides a 
perfect branching point at x — 90, y = 30, which sets 1 and 2 
do not find. 

The middle row of Fig.|6]illustrates the filament networks 
found for data set B. This data set has a richer and more uni- 
form selection of filaments than set A. As these sets have ap- 
proximately the same number of galaxies, individual filaments 
in set B are more sparse and harder to identify. Nevertheless, 
the method works well, especially for the parameter set 1, the 



best set; this network is shown in the middle left panel of Fig. [6] 
There are only a few questionable short filaments, e.g. around 
x = 70, y — 120 and x = 10, y = 50. Parameter set 2 generates 
considerably more short isolated filaments, which do not rep- 
resent the data well, and the filaments for parameter set 3 tend 
to deviate in wrong directions. 

Data set C has the richest set of filaments. These are shorter 
and not as pronounced as the filaments in the first two data sets 
— this is the way the large scale structure develops in the uni- 
verse. The early structure that set C describes evolves by con- 
centrating into larger and larger clusters and filaments; small- 
scale structure becomes weaker and disappears gradually. In or- 
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Fig. 7. Filament length distribution histograms for the three data sets (marked in the panels). Solid lines indicate parameter set 1, 
dashed lines set 2, and dotted lines set 3. The combined filament length distribution histograms for the three data sets is shown 
in the bottom right panel. Solid line indicates data set A, dashed line set B, and dotted line set C. 



der to apply the Candy model, we had to generate about twice 
as many galaxies for this set as for the other two. As shown 
in the bottom row of Fig. [6] our procedure delineates the fila- 
mentary network satisfactorily here, too, although probably the 
segments should have been even smaller. As seen in the bottom 
left panel for the best parameter set (set 1 in this case, too), 
segments sometimes jump from an obvious filament to another 
(e.g., at x — 47, y = 20); there is also a tendency to form short 
filaments for a collection of a few points, as at x = 7, y = 60 
and x = 75, y = 107. 

Parameter set 2 is in this case about as good as set 1; it 
misses a few obvious filaments, however (e.g., at x = 124, y = 
50), and has difficulties in resolving interaction regions (knots 
in the network), see the region at x — 70, y = 110. This region 
has been equally difficult to model for all three parameter sets. 
And, finally, parameter set 3 gives the worst filament placement 
among the three. 

4.2. Length distribution 

As the Candy model is able to reconstruct the filamentary net- 
work, given a point process (galaxy map), the collection of 
its parameters can be considered as a description of the net- 
work. When determined from the data by a likelihood proce- 
dure, they can serve as statistics of the network. But there are 



simple statistics we can already study; the simplest one is the 
probability distribution of the lengths of individual filaments 
(sets of connected segments). A similar problem, that of the 
length of the largest filament, has been addressed recently, us- 
ing a pixel-based method to define filaments and finding the 
pixe l size where the fil aments are still statistically significant 
(Bharad wai et alJ l2004). As filaments delineate voids, the dis- 
tribution of the lengths of filaments is also connected with the 
distributi on of void sizes. Th is subject has a long history; see, 
e.g., Martinez & Saai (2002) and an int eresting recent theoret- 
ical paper by Sheth & van de Wevgaert (2004). 

Comparison of the length histograms in Fig.Qreveals a se- 
ries of peaks, several distinct characteristic lengths in the fil- 
amentary network 2 . These peaks are especially prominent for 
data sets A and C, and smeared out for set B. Also, the locations 
of the peaks do not depend much on the specific parameter set, 
the peaks more or less coincide. And although the sample sizes 
are a bit too small for the first two data sets to draw firm con- 
clusions, inspection of the probability distributions shows that 
the peaks are real. 

Another feature of the distributions is their pronounced tails 
- the lengths of filaments reach about 90, almost the full size of 

2 The histograms shown have bins of varying width since they are 
of equal area. Although not common, it is a good, non-parametrical 
way to represent probability densities. 
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the box. Inspection of Fig.[6]shows that long filaments are those 
that pass through the branching points and are really collections 
of several filaments. So, in the future we have to find a recipe 
for locating the branching points and breaking the filaments; 
otherwise we shall lose connection with the void distribution. 
For the histograms in Fig. 0this means that there would be 
additional contributions to the 20-40 length range, which are 
presently missing. 

As the locations of the peaks are almost independent of the 
parameter set used, we combined the length data for the three 
parameter sets. These distributions for the three data sets are 
compared in the bottom right panel of Fig.0 Thus these peaks 
are significant, revealing discrete scales in the data. We also see 
that the overall length distribution is shifted to the shorter sizes 
for set C, compared with A and B. 

5. Other approaches 

There exist only a few methods to describe the observed fila- 
mentary network of galaxies. The be st known approach is that 
of minimal spanning trees (MST) (Harr ow etalill 984 . The 
minimal spanning tree connects all data points, and its length 
distribution function describes mainly the nearest-neighbour 
distance distributions, not the large-scale network we see. The 
MST also has to connect all points in the clusters, while the 
Candy model can be tuned to ignore them (as we have seen, 
clusters usually become branching points of the filament net- 
work in the Candy model). The differences between the MST 
and the Candy model can be well seen in Fig. [8] Nevertheless, 
the MST has been used extensively to describe the filamentary 
network; as the Candy model looks much better and is well 
suited for incomplete samples, it should lead to a better de- 
scription of the cosmic filaments. 

Using a technique ba sed on multiscale geometric analysis 
lArias-Castro et al] J2003h have showed how filaments can be 
detected when they are embedded in a uniformly distributed 
background of points. This algorithm is specially aimed at find- 
ing hidden filamentary patterns in images or nearly Poissonian 
point processes. Our approach is different and is better suited 
for finding many filaments in clustered point processes, be- 
cause the Candy model produces smoother maps and is able 
to better combine both local and large scale characteristics of 
the galaxy distribution. 

Another, more rec ent approach to describing filaments 
(Bharad wai et alj E004) proceeds by binning the map (calcu- 
lating a density field), and using Minkowski functionals of the 
isodensity contours to estimate the filamentarity of the objects. 
While this approach will classify all objects, it has two free pa- 
rameters, the smoothing length (size of the density bin), and the 
isodensity level. True, in some respects our approach is similar 
to that, as the segments of the Candy model have a finite width 
(we are also estimating a density field). But our density estima- 
tor is anisotropic and adaptive, in principle, and we trace only 
filamentary structures. 

A third approach that is also based on a density field, is 
to determine the saddle points and to build a network of field 
lines (directed along the gradient of the field), connecting sad- 
dle points with local maxima - the skeleton jNovikov et alJ 



2003) . This approach could reconstruct well the cellular net- 
work of filaments (so far it has been applied to studyin g the 
pixeli zed cosmic microwave background data bvl Eriksen et alJ 

2004) , but it will also depend on the density estimation proce- 
dure. And, as the authors say, this approach is computationally 
complex. 

6. Conclusion and perspectives 

The parameter values for our method were chosen by trial and 
error. Under these circumstances, parameter estimation using 
Monte Carlo maximum likelihood methods may be considered 
jGeverl 19991; Ivan Lieshout & Stoical2003al) . These parameters 
could then be considered as statistics describing the filamen- 
tary network. They will certainly be much better suited for this 
task than the moments of the density distribution in real or in 
Fourier space which are commonly used in cosmology. 

The data term is a very simple test. Much more sophisti- 
cated techniques such as testing the alignment of the points 
covered by a segment, or statistical tests such as the complete 
randomness test need investigation. 

To our knowledge there is no proof for the existence of an 
optimal cooling scheme when using Metropolis-Hastings dy- 
namics for simulating point processes in a simulated anneal- 
ing algorithm. There is such a proof for the spatial birth-and- 
death process, but in practice t he authors sample t he model us- 
ing a fixed cold temperature ( Ivan Lies hout 1994). The choice 
we opted for, a slow polynomial decreasing scheme, does not 
guarantee that the global optimum is reached. 

But overall, as we have seen, the results are good, the 
Candy model can be tuned to trace the filamentary network 
well. And it can be naturally generalized to des cribe the real 
3-D f ilamentary networks of galaxy maps; see iGregori et alJ 
(2004|). As we already said above, it can also be considered 
as a tool for providing statistics of filamentary networks. These 
are the future directions of our work. 
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